A shortest-path graph kernel for estimating gene product semantic similarity

نویسندگان

  • Marco A. Alvarez
  • Xiaojun Qi
  • Changhui Yan
چکیده

BACKGROUND Existing methods for calculating semantic similarity between gene products using the Gene Ontology (GO) often rely on external resources, which are not part of the ontology. Consequently, changes in these external resources like biased term distribution caused by shifting of hot research topics, will affect the calculation of semantic similarity. One way to avoid this problem is to use semantic methods that are "intrinsic" to the ontology, i.e. independent of external knowledge. RESULTS We present a shortest-path graph kernel (spgk) method that relies exclusively on the GO and its structure. In spgk, a gene product is represented by an induced subgraph of the GO, which consists of all the GO terms annotating it. Then a shortest-path graph kernel is used to compute the similarity between two graphs. In a comprehensive evaluation using a benchmark dataset, spgk compares favorably with other methods that depend on external resources. Compared with simUI, a method that is also intrinsic to GO, spgk achieves slightly better results on the benchmark dataset. Statistical tests show that the improvement is significant when the resolution and EC similarity correlation coefficient are used to measure the performance, but is insignificant when the Pfam similarity correlation coefficient is used. CONCLUSIONS Spgk uses a graph kernel method in polynomial time to exploit the structure of the GO to calculate semantic similarity between gene products. It provides an alternative to both methods that use external resources and "intrinsic" methods with comparable performance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The All-Paths and Cycles Graph Kernel

With the recent rise in the amount of structured data available, there has been considerable interest in methods for machine learning with graphs. Many of these approaches have been kernel methods, which focus on measuring the similarity between graphs. These generally involving measuring the similarity of structural elements such as walks or paths. Borgwardt and Kriegel [1] proposed the all-pa...

متن کامل

A semantic relatedness metric based on free link structure

While shortest paths in WordNet are known to correlate well with semantic similarity, an is-a hierarchy is less suited for estimating semantic relatedness. We demonstrate this by comparing two free scale networks ( ConceptNet and Wikipedia) to WordNet. Using the Finkelstein353 dataset we show that a shortest path metric run on Wikipedia attains a better correlation than WordNet-based metrics. C...

متن کامل

A semantic relatedness metric based on free link structure (short paper)

While shortest paths in WordNet are known to correlate well with semantic similarity, an is-a hierarchy is less suited for estimating semantic relatedness. We demonstrate this by comparing two free scale networks ( ConceptNet and Wikipedia) to WordNet. Using the Finkelstein353 dataset we show that a shortest path metric run on Wikipedia attains a better correlation than WordNet-based metrics. C...

متن کامل

Shortest-Path Graph Kernels for Document Similarity

In this paper, we present a novel document similarity measure based on the definition of a graph kernel between pairs of documents. The proposed measure takes into account both the terms contained in the documents and the relationships between them. By representing each document as a graph-of-words, we are able to model these relationships and then determine how similar two documents are by usi...

متن کامل

Characterisation of semantic similarity on gene ontology based on a shortest path approach

Semantic similarity defined on Gene Ontology (GO) aims to provide the functional relationship between different GO terms. In this paper, a novel method, namely the Shortest Path (SP) algorithm, for measuring the semantic similarity on GO terms is proposed based on both GO structure information and the term's property. The proposed algorithm searches for the shortest path that connects two terms...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 2  شماره 

صفحات  -

تاریخ انتشار 2011